The New Thot Toolkit for Fully-Automatic and Interactive Statistical Machine Translation
نویسندگان
چکیده
We present the new THOT toolkit for fullyautomatic and interactive statistical machine translation (SMT). Initial public versions of THOT date back to 2005 and did only include estimation of phrase-based models. By contrast, the new version offers several new features that had not been previously incorporated. The key innovations provided by the toolkit are computeraided translation, including post-editing and interactive SMT, incremental learning and robust generation of alignments at phrase level. In addition to this, the toolkit also provides standard SMT features such as fully-automatic translation, scalable and parallel algorithms for model training, client-server implementation of the translation functionality, etc. The toolkit can be compiled in Unix-like and Windows platforms and it is released under the GNU Lesser General Public License (LGPL).
منابع مشابه
Phrasal: A Toolkit for New Directions in Statistical Machine Translation
We present a new version of Phrasal, an open-source toolkit for statistical phrasebased machine translation. This revision includes features that support emerging research trends such as (a) tuning with large feature sets, (b) tuning on large datasets like the bitext, and (c) web-based interactive machine translation. A direct comparison with Moses shows favorable results in terms of decoding s...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملOnline Learning for Statistical Machine Translation
We present online learning techniques for statistical machine translation (SMT). The availability of large training data sets that grow constantly over time is becoming more and more frequent in the field of SMT—for example, in the context of translation agencies or the daily translation of government proceedings. When new knowledge is to be incorporated in the SMT models, the use of batch lear...
متن کاملبهبود و توسعه یک سیستم مترجمیار انگلیسی به فارسی
In recent years, significant improvements have been achieved in statistical machine translation (SMT), but still even the best machine translation technology is far from replacing or even competing with human translators. Another way to increase the productivity of the translation process is computer-assisted translation (CAT) system. In a CAT system, the human translator begins to type the tra...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کامل